Unsupervised learning of morphological families: comparison of methods and multilingual aspects
نویسنده
چکیده
RÉSUMÉ. Cet article décrit MorphoClust et MorphoNet, deux méthodes pour l’apprentissage non supervisé de familles morphologiques. MorphoClust forme des familles par groupements successifs, de manière similaire aux méthodes de classification ascendante hiérarchique. La méthode MorphoNet est quant à elle fondée sur la détection de communautés dans des réseaux lexicaux. Les nœuds de ces réseaux représentent des mots et les liens des règles de transformation morphologique acquises automatiquement à partir de mots graphiquement similaires. Nous appliquons ces deux méthodes à un lexique bilingue anglais-allemand, de manière isolée et sous forme combinée, et évaluons les résultats obtenus en utilisant la base de données lexicales CELEX.
منابع مشابه
Comparison school bonding and interpersonal problems in students with unsupervised and abused families with normal
This study aimed to compare the school bonding and interpersonal problems in students with unsupervised and abused families with normal families in Bandar Lengeh. The sample consisted of 152 normal students and 81 unsupervised or abused students. Normal students were selected by the multi-stage cluster sampling method. Data were collected through two questionnaires: school bonding (Rezaei Shari...
متن کاملFast and unsupervised methods for multilingual cognate clustering
In this paper we explore the use of unsupervised methods for detecting cognates in multilingual word lists. We use online EM to train sound segment similarity weights for computing similarity between two words. We tested our online systems on geographically spread sixteen different language groups of the world and show that the Online PMI system (Pointwise Mutual Information) outperforms a HMM ...
متن کاملUnsupervised Learning of Morphological Forests
This paper focuses on unsupervised modeling of morphological families, collectively comprising a forest over the language vocabulary. This formulation enables us to capture edgewise properties reflecting single-step morphological derivations, along with global distributional properties of the entire forest. These global properties constrain the size of the affix set and encourage formation of t...
متن کاملConstruction of supervised and unsupervised learning systems for multilingual text categorization
Due to the availability of a huge amount of textual data from a variety of sources, users of internationally distributed information regions need effective methods and tools that enable them to discover, retrieve and categorize relevant information, in whatever language and form it may have been stored. This drives a convergence of numerous interests from diverse research communities focusing o...
متن کاملUnsupervised Multilingual Learning for Morphological Segmentation
For centuries, the deep connection between languages has brought about major discoveries about human communication. In this paper we investigate how this powerful source of information can be exploited for unsupervised language learning. In particular, we study the task of morphological segmentation of multiple languages. We present a nonparametric Bayesian model that jointly induces morpheme s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- TAL
دوره 51 شماره
صفحات -
تاریخ انتشار 2010